## [1] 1599 12
## [1] "fixed.acidity" "volatile.acidity" "citric.acid"
## [4] "residual.sugar" "chlorides" "free.sulfur.dioxide"
## [7] "total.sulfur.dioxide" "density" "pH"
## [10] "sulphates" "alcohol" "quality"
## 'data.frame': 1599 obs. of 12 variables:
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## fixed.acidity volatile.acidity citric.acid residual.sugar
## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900
## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900
## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200
## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539
## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600
## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500
## chlorides free.sulfur.dioxide total.sulfur.dioxide
## Min. :0.01200 Min. : 1.00 Min. : 6.00
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00
## Median :0.07900 Median :14.00 Median : 38.00
## Mean :0.08747 Mean :15.87 Mean : 46.47
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00
## Max. :0.61100 Max. :72.00 Max. :289.00
## density pH sulphates alcohol
## Min. :0.9901 Min. :2.740 Min. :0.3300 Min. : 8.40
## 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50
## Median :0.9968 Median :3.310 Median :0.6200 Median :10.20
## Mean :0.9967 Mean :3.311 Mean :0.6581 Mean :10.42
## 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10
## Max. :1.0037 Max. :4.010 Max. :2.0000 Max. :14.90
## quality
## Min. :3.000
## 1st Qu.:5.000
## Median :6.000
## Mean :5.636
## 3rd Qu.:6.000
## Max. :8.000
## fixed.acidity volatile.acidity citric.acid
## 0.209275508 0.339243549 0.718888086
## residual.sugar chlorides free.sulfur.dioxide
## 0.555350954 0.538094924 0.658910770
## total.sulfur.dioxide density pH
## 0.707916662 0.001893494 0.046626755
## sulphates alcohol quality
## 0.257551132 0.102242090 0.143287121
## citric.acid
## 0.7188881
## density
## 0.001893494
## pH
## 0.04662676
According to requirement, we need to analyze which chemical properties (minimum 8 independent variables) influence the quality of red wines. With this data, we already have 12 variables, presumably:
quality - as the dependent or response variable
all others - the independent variables
We also learned that ‘density’ and ‘pH’ have very small CV (less than 0.2% and 4.7% respectively) compare to other features within given dataset, therefore I tend to ignore ‘density’ and ‘pH’ as independent variables of chemical properties in Univariate Plot and Analysis because:
## [1] "fixed.acidity" "volatile.acidity" "citric.acid"
## [4] "residual.sugar" "chlorides" "free.sulfur.dioxide"
## [7] "total.sulfur.dioxide" "sulphates" "alcohol"
## [10] "quality"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.636 6.000 8.000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3300 0.5500 0.6200 0.6581 0.7300 2.0000
## 96.37%
## 0.999926
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 22.00 38.00 46.47 62.00 289.00
## 99.5%
## 151
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 7.00 14.00 15.87 21.00 72.00
## 99.5%
## 53.01
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100
## 99.5%
## 0.41401
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 2.200 2.539 2.600 15.500
## 99.5%
## 11.019
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.090 0.260 0.271 0.420 1.000
## 99.5%
## 0.74
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3900 0.5200 0.5278 0.6400 1.5800
## 99.5%
## 1.09025
##
## 0.36 0.58 0.59 0.43 0.5 0.6
## 38 38 39 43 46 47
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.60 7.10 7.90 8.32 9.20 15.90
## 'data.frame': 1599 obs. of 10 variables:
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
There are 1599 red wine samples (observations) in the dataset with 9 chemical features:
and two physical measurement dependent variables:
and one main investigation feature (human rated red wine quality scores):
The main feature of interest in the dataset is the quality. I’d like to explore how different chemical features are contributing to Good or Bad wine quality, and their contribution to wine’s physical properties e.g. density, pH (included in dataset red vs. reds) as well.
density, pH themselves are mostly decided by the composition of other 9 chemical elements.
Because of this even though themselves can not be the causation of wine quality, they may carry a great correlation to wine quality in the limited variance space of wine making process.
Hopefully they can be used to pair with other features to expose notable correlations with wine quality in analysis.
Not so far, instead I purged the variable ‘X’ from original dataframe from csv import, provided that I’m not interested in sample sequence analysis (assuming a random sequence).
## [1] "integer"
## [1] 3 8
Considering quality is an ordinal variable of int (Min. 3 ~ Max. 8), later I will factor it to an additional categorical variable called ‘qua’.
It will be extremely helpful in visualizing other variables change along with new categorical variable ‘qua’, either in color, facet histogram or boxplots.
## fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## [1,] 4.6 0.12 0 0.9 0.012
## [2,] 15.9 1.58 1 15.5 0.611
## free.sulfur.dioxide total.sulfur.dioxide density pH sulphates
## [1,] 1 6 0.99007 2.74 0.33
## [2,] 72 289 1.00369 4.01 2.00
## alcohol quality
## [1,] 8.4 3
## [2,] 14.9 8
The red wine data set is already tidy, and no abnormality (e.g. negative) is seen from above range check.
I tend to believe the data of all features fall in an acceptable error range (Sorry I can NOT confirm this - since I don’t have multiple measurements of single sample)
Though in the Univariate Plots above, I do observe long tailed data samples in a few features, I cut them (negligible in persentage) to better visualize the target features.
## fixed.acidity volatile.acidity citric.acid
## fixed.acidity 1.00000000 -0.256130895 0.67170343
## volatile.acidity -0.25613089 1.000000000 -0.55249568
## citric.acid 0.67170343 -0.552495685 1.00000000
## residual.sugar 0.11477672 0.001917882 0.14357716
## chlorides 0.09370519 0.061297772 0.20382291
## free.sulfur.dioxide -0.15379419 -0.010503827 -0.06097813
## total.sulfur.dioxide -0.11318144 0.076470005 0.03553302
## density 0.66804729 0.022026232 0.36494718
## pH -0.68297819 0.234937294 -0.54190414
## sulphates 0.18300566 -0.260986685 0.31277004
## alcohol -0.06166827 -0.202288027 0.10990325
## quality 0.12405165 -0.390557780 0.22637251
## residual.sugar chlorides free.sulfur.dioxide
## fixed.acidity 0.114776724 0.093705186 -0.153794193
## volatile.acidity 0.001917882 0.061297772 -0.010503827
## citric.acid 0.143577162 0.203822914 -0.060978129
## residual.sugar 1.000000000 0.055609535 0.187048995
## chlorides 0.055609535 1.000000000 0.005562147
## free.sulfur.dioxide 0.187048995 0.005562147 1.000000000
## total.sulfur.dioxide 0.203027882 0.047400468 0.667666450
## density 0.355283371 0.200632327 -0.021945831
## pH -0.085652422 -0.265026131 0.070377499
## sulphates 0.005527121 0.371260481 0.051657572
## alcohol 0.042075437 -0.221140545 -0.069408354
## quality 0.013731637 -0.128906560 -0.050656057
## total.sulfur.dioxide density pH
## fixed.acidity -0.11318144 0.66804729 -0.68297819
## volatile.acidity 0.07647000 0.02202623 0.23493729
## citric.acid 0.03553302 0.36494718 -0.54190414
## residual.sugar 0.20302788 0.35528337 -0.08565242
## chlorides 0.04740047 0.20063233 -0.26502613
## free.sulfur.dioxide 0.66766645 -0.02194583 0.07037750
## total.sulfur.dioxide 1.00000000 0.07126948 -0.06649456
## density 0.07126948 1.00000000 -0.34169933
## pH -0.06649456 -0.34169933 1.00000000
## sulphates 0.04294684 0.14850641 -0.19664760
## alcohol -0.20565394 -0.49617977 0.20563251
## quality -0.18510029 -0.17491923 -0.05773139
## sulphates alcohol quality
## fixed.acidity 0.183005664 -0.06166827 0.12405165
## volatile.acidity -0.260986685 -0.20228803 -0.39055778
## citric.acid 0.312770044 0.10990325 0.22637251
## residual.sugar 0.005527121 0.04207544 0.01373164
## chlorides 0.371260481 -0.22114054 -0.12890656
## free.sulfur.dioxide 0.051657572 -0.06940835 -0.05065606
## total.sulfur.dioxide 0.042946836 -0.20565394 -0.18510029
## density 0.148506412 -0.49617977 -0.17491923
## pH -0.196647602 0.20563251 -0.05773139
## sulphates 1.000000000 0.09359475 0.25139708
## alcohol 0.093594750 1.00000000 0.47616632
## quality 0.251397079 0.47616632 1.00000000
Observations:
## fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## [1,] 0.1240516 -0.3905578 0.2263725 0.01373164 -0.1289066
## free.sulfur.dioxide total.sulfur.dioxide density pH
## [1,] -0.05065606 -0.1851003 -0.1749192 -0.05773139
## sulphates alcohol
## [1,] 0.2513971 0.4761663
## alcohol volatile.acidity sulphates citric.acid total.sulfur.dioxide
## [1,] 0.4761663 -0.3905578 0.2513971 0.2263725 -0.1851003
## density chlorides fixed.acidity pH free.sulfur.dioxide
## [1,] -0.1749192 -0.1289066 0.1240516 -0.05773139 -0.05065606
## residual.sugar
## [1,] 0.01373164
## alcohol volatile.acidity sulphates citric.acid total.sulfur.dioxide
## [1,] 0.4761663 -0.3905578 0.2513971 0.2263725 -0.1851003
## chlorides fixed.acidity
## [1,] -0.1289066 0.1240516
## fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## [1,] 0.6680473 0.02202623 0.3649472 0.3552834 0.2006323
## free.sulfur.dioxide total.sulfur.dioxide density pH sulphates
## [1,] -0.02194583 0.07126948 1 -0.3416993 0.1485064
## alcohol quality
## [1,] -0.4961798 -0.1749192
## fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## [1,] -0.6829782 0.2349373 -0.5419041 -0.08565242 -0.2650261
## free.sulfur.dioxide total.sulfur.dioxide density pH sulphates
## [1,] 0.0703775 -0.06649456 -0.3416993 1 -0.1966476
## alcohol quality
## [1,] 0.2056325 -0.05773139
Grossly I classified 12 features to 3 classes:
Observations:
certain strong correlations exist among 9 chemical features, notably:
## [1] 0.6676665
## fixed.acidity volatile.acidity
## [1,] 0.6717034 -0.5524957
density and pH have strong correlations to certain chemical features, notably:
## fixed.acidity alcohol citric.acid residual.sugar
## [1,] 0.6680473 -0.4961798 0.3649472 0.3552834
## fixed.acidity citric.acid chlorides volatile.acidity alcohol
## [1,] -0.6829782 -0.5419041 -0.2650261 0.2349373 0.2056325
quality have notable correlations to 7 chemical features:
## alcohol volatile.acidity sulphates citric.acid total.sulfur.dioxide
## [1,] 0.4761663 -0.3905578 0.2513971 0.2263725 -0.1851003
## chlorides fixed.acidity
## [1,] -0.1289066 0.1240516quality have very weak correlations to 2 other chemical features:
## free.sulfur.dioxide residual.sugar
## [1,] -0.05065606 0.01373164Other interesting relationships include:
alcohol is some what positively correlated to pH, since it’s pH higher than 7.
fixed.acidity <=> citric.acid
## [1] 0.6717034fixed.acidity <=> density
## [1] 0.6680473fixed.acidity <=> pH
## [1] -0.6829782volatile.acidity <=> citric.acid
## [1] -0.5524957volatile.acidity <=> quality
## [1] -0.3905578citric.acid <=> pH
## [1] -0.5419041free.sulfur.dioxide <=> total.sulfur.dioxide
## [1] 0.6676665alcohol <=> density
## [1] -0.4961798alcohol <=> quality
## [1] 0.4761663fixed.acidity <=> pH
## [1] -0.6829782alcohol <=> quality
## [1] 0.4761663## alcohol volatile.acidity sulphates citric.acid total.sulfur.dioxide
## [1,] 0.4761663 -0.3905578 0.2513971 0.2263725 -0.1851003
## chlorides fixed.acidity
## [1,] -0.1289066 0.1240516
##
## Calls:
## m1: lm(formula = quality ~ alcohol, data = red)
## m2: lm(formula = quality ~ alcohol + volatile.acidity, data = red)
## m3: lm(formula = quality ~ alcohol + volatile.acidity + sulphates,
## data = red)
## m4: lm(formula = quality ~ alcohol + volatile.acidity + sulphates +
## citric.acid, data = red)
## m5: lm(formula = quality ~ alcohol + volatile.acidity + sulphates +
## citric.acid + total.sulfur.dioxide, data = red)
## m6: lm(formula = quality ~ alcohol + volatile.acidity + sulphates +
## citric.acid + total.sulfur.dioxide + chlorides, data = red)
## m7: lm(formula = quality ~ alcohol + volatile.acidity + sulphates +
## citric.acid + total.sulfur.dioxide + chlorides + fixed.acidity,
## data = red)
##
## =====================================================================================================
## m1 m2 m3 m4 m5 m6 m7
## -----------------------------------------------------------------------------------------------------
## (Intercept) 1.875*** 3.095*** 2.611*** 2.646*** 2.843*** 2.985*** 2.652***
## (0.175) (0.184) (0.196) (0.201) (0.205) (0.206) (0.240)
## alcohol 0.361*** 0.314*** 0.309*** 0.309*** 0.295*** 0.276*** 0.288***
## (0.017) (0.016) (0.016) (0.016) (0.016) (0.017) (0.017)
## volatile.acidity -1.384*** -1.221*** -1.265*** -1.222*** -1.104*** -1.173***
## (0.095) (0.097) (0.113) (0.112) (0.115) (0.118)
## sulphates 0.679*** 0.696*** 0.721*** 0.908*** 0.888***
## (0.101) (0.103) (0.103) (0.111) (0.111)
## citric.acid -0.079 -0.043 0.065 -0.203
## (0.104) (0.104) (0.106) (0.145)
## total.sulfur.dioxide -0.002*** -0.002*** -0.002***
## (0.001) (0.001) (0.001)
## chlorides -1.763*** -1.576***
## (0.403) (0.408)
## fixed.acidity 0.037**
## (0.014)
## -----------------------------------------------------------------------------------------------------
## R-squared 0.2 0.3 0.3 0.3 0.3 0.4 0.4
## adj. R-squared 0.2 0.3 0.3 0.3 0.3 0.3 0.4
## sigma 0.7 0.7 0.7 0.7 0.7 0.7 0.7
## F 468.3 370.4 268.9 201.8 167.0 143.9 124.9
## p 0.0 0.0 0.0 0.0 0.0 0.0 0.0
## Log-likelihood -1721.1 -1621.8 -1599.4 -1599.1 -1589.7 -1580.2 -1576.5
## Deviance 805.9 711.8 692.1 691.9 683.8 675.7 672.6
## AIC 3448.1 3251.6 3208.8 3210.2 3193.5 3176.4 3171.1
## BIC 3464.2 3273.1 3235.7 3242.4 3231.1 3219.4 3219.5
## N 1599 1599 1599 1599 1599 1599 1599
## =====================================================================================================
The variables in this linear model can account for 40% of the variance in the quality of red wines.
## red$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.400 9.725 9.925 9.955 10.580 11.000
## --------------------------------------------------------
## red$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.00 9.60 10.00 10.27 11.00 13.10
## --------------------------------------------------------
## red$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.5 9.4 9.7 9.9 10.2 14.9
## --------------------------------------------------------
## red$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.80 10.50 10.63 11.30 14.00
## --------------------------------------------------------
## red$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.20 10.80 11.50 11.47 12.10 14.00
## --------------------------------------------------------
## red$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.80 11.32 12.15 12.09 12.88 14.00
I choose this plot in the final to confirm the hypothesis that higher alcohol level does contribute to better wine quality.
I think this result does make sense, as wines with ~12% alcohol level are just right for the taste :)
Presumably higher ‘alcohol’ lines must have higher ‘quality’ average (or median)
But ‘alcohol’ is not categorical variable, so the question is inferred to draw distribution density of ‘alcohol’ level by different quality
Previously we have summary of m7 as:
##
## Call:
## lm(formula = quality ~ alcohol + volatile.acidity + sulphates +
## citric.acid + total.sulfur.dioxide + chlorides + fixed.acidity,
## data = red)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.72028 -0.37289 -0.06422 0.45556 2.01920
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.6516821 0.2402263 11.038 < 2e-16 ***
## alcohol 0.2876488 0.0170070 16.914 < 2e-16 ***
## volatile.acidity -1.1733907 0.1177349 -9.966 < 2e-16 ***
## sulphates 0.8877424 0.1108192 8.011 2.18e-15 ***
## citric.acid -0.2030352 0.1452195 -1.398 0.162270
## total.sulfur.dioxide -0.0019662 0.0005278 -3.725 0.000202 ***
## chlorides -1.5757580 0.4080225 -3.862 0.000117 ***
## fixed.acidity 0.0367162 0.0136220 2.695 0.007105 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6502 on 1591 degrees of freedom
## Multiple R-squared: 0.3546, Adjusted R-squared: 0.3518
## F-statistic: 124.9 on 7 and 1591 DF, p-value: < 2.2e-16
Then:
we have summary of m8 as:
##
## Call:
## lm(formula = quality ~ alcohol + volatile.acidity + sulphates +
## citric.acid + total.sulfur.dioxide + chlorides + fixed.acidity +
## density, data = red)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.73073 -0.36653 -0.06734 0.45255 1.98186
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.816e+01 1.508e+01 1.867 0.062042 .
## alcohol 2.680e-01 2.059e-02 13.019 < 2e-16 ***
## volatile.acidity -1.137e+00 1.196e-01 -9.505 < 2e-16 ***
## sulphates 9.163e-01 1.120e-01 8.179 5.8e-16 ***
## citric.acid -1.982e-01 1.452e-01 -1.365 0.172321
## total.sulfur.dioxide -1.907e-03 5.287e-04 -3.606 0.000320 ***
## chlorides -1.584e+00 4.078e-01 -3.883 0.000107 ***
## fixed.acidity 5.473e-02 1.729e-02 3.167 0.001572 **
## density -2.558e+01 1.512e+01 -1.692 0.090896 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6498 on 1590 degrees of freedom
## Multiple R-squared: 0.3558, Adjusted R-squared: 0.3525
## F-statistic: 109.8 on 8 and 1590 DF, p-value: < 2.2e-16
The red wine data set contains information on 1599 samples. I started by understanding individual variables and classifying them to three classes: 9 chemical variables, 2 observed measurement variables (density and pH), and 1 response target variable (quality). Then I further explored the relationships between quality and other variables, and major relations among other variables via plots and statistical analysis. Eventually I end up with a linear model between sampled quality and alcohol, volatile.acidity, sulphates, citric.acid, total.sulfur.dioxide, chlorides and fixed.acidity to predict wine quality. The model acuracy isn’t high enough (from R^2), leaving a room to improve.
The assumption that density and pH are purely determined by other 9 chemical variables may not hold true in real world, reason being the water quality from different winery or vinyard may be different, that could lead to different base pH (maybe slightly different in density as well) in wine water to start with, to this extend pH can be sometimes independent, but it is not looked as one in this analysis.
From the result, I think there are other key features that are not included in data set are contributing to wine qualities, in short I can NOT image the quality of red wine can be highly coefficiently determined by the 11 features of this data set, in real world red wine quality can majorly rely on some other variables as well, e.g. grape type, blend flavor additives, etc.